A Language Modeling Approach to Metadata for Cross-Database Linkage and Search

نویسندگان

  • W. Bruce Croft
  • James P. Callan
چکیده

This research demonstrates that language models are a sound and effective foundation on which to build large-scale, distributed information systems for government applications. It contributes to providing an alternative to human-generated metadata for locating information resources. Manual indexing is expensive, and studies show that people are inconsistent and inaccurate when doing indexing, which leads to poor retrieval effectiveness. Generating content descriptions automatically from the markup and structure of documents is less expensive and, when coupled with good search techniques, can be used to locate relevant information more consistently. The evaluation testbeds for our research have been government databases such as those found in FedStats and GPO.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognition and Transliteration of Proper Nouns in Cross-Language Record Linkage by Constructing Transliterated Word Pairs

Proper nouns in metadata are representative features for linking the identical records across data sources in different languages. To improve the recognition of proper nouns in metadata and obtain their transliterations, we propose a method to construct bilingual transliteration word pairs, in which transliterated words in target language are back-transliterated to their original words in sourc...

متن کامل

Design and Implementation of a Comprehensive Database of the Written Heritage of Science and Technology

Purpose: This study aims to design and implement a comprehensive database of the written heritage of science and technology in the Regional Information Center for Science and Technology (RICeST) and determine the metadata elements required to describe the manuscripts. Method: This study was carried out by the content analysis method to identify the metadata elements needed to describe the coll...

متن کامل

Metadata Enrichment for Automatic Data Entry Based on Relational Data Models

The idea of automatic generation of data entry forms based on data relational models is a common and known idea that has been discussed day by day more than before according to the popularity of agile methods in software development accompanying development of programming tools. One of the requirements of the automation methods, whether in commercial products or the relevant research projects, ...

متن کامل

بررسی واکنش موتورهای کاوش وب به پیشینه‌های فرادا‌ده‌ای مبتنی برروش ترکیبی داده‌های خرد و روش داده‌های پیوندی

The purpose of this research was to find out the reaction of Web Search Engines to Metadata records created based on the combined method of Rich Snippets and Linked Data. 200 metadata records in two groups (100 records as the control group with the normal structure and, 100 records created based on microdata and implemented in RDF/XML as experimental group) extracted from the information gatewa...

متن کامل

بررسی پایگاه های کتاب الکترونیکی با تاکید بر ابر داده

Introduction: With the exponential growth of electronic resources on the Web, the application of metadata has enhanced the precision of retrieval and facilitated the search of electronic resources. Hence, the aim of this study was to determine the application of metadata in e-book databases. Methods: This study is an applied work, which was carried out through survey methods in 2013. The pop...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004